WEM High Availability Redundancy Installations

WEM High Availability Redundancy Installations
 
Systems that rely on running Web Element Manager on a single server to manage their networks face the possibility of service disruption should the server fail. By using Oracle Clusterware or Symantec Veritas software, it is now possible to create redundant Web Element Manager servers with a primary host server running an active instance of Web Element Manager, and a redundant server in standby mode. This appendix provides information to help you successfully configure redundant instances of Web Element Manager over multiple servers. This appendix works closely with the Installing the WEM Software and the WEM Port and Hardware chapters in this guide.
note_smallImportant: Oracle Clusterware is supported on the Solaris operating system; however, Veritas is supported on Solaris and RHEL. During the installation process a radio button is provided to choose the required software.
Configuring High Availability Redundancy Using Solaris Cluster Software
This section describes the installation, configuration and upgrade procedures for High Availabilty on servers using the Solaris OS. You should also refer to the Solaris documentation. In any situation where this guide appears in conflict with the oficial Solaris documentation, the Solaris documentation shall take precedence.
System Requirements
Requirements for implementing High Availability are as follows:
Web Element Manager must be installed on a minimum of two Sun Netra™ T5220 servers equipped with the hardware described in the Server Hardware Requirements section of this guide.
We recommend a cluster installation restricted to two servers configured similar to that shown in the diagram below. The sample configurations for Oracle Cluster assume such an installation.
note_smallImportant: Ensure you have installed the latest version of Oracle Solaris software and all appropriate software patches as described in the Operating System Requirements section.
IPMP is a feature supported on Oracle Solaris. For more complete configuration information, refer to Configuring IPMP for WEM Server and also to the Oracle product documentation.
Oracle Solaris Cluster is a feature provided and supported by Oracle. For more complete information on configuring Resource Groups, refer to the Oracle Solaris Cluster product documentation.
Installing Web Element Manager for Failover Mode
This section specifies the configuration changes required when installing WEM in Failover Mode rather than Standalone Mode when following the installation instructions in the Installing the WEM Software chapter. For this release, please use the GUI to perform the installation rather than the command line.
note_smallImportant: Install and configure Web Element Manager in Failover Mode on both servers before configuring a cluster resource group.
The following items are either different from, or prerequisites for, the installation steps defined in the Installing the WEM Software chapter:
Create a file directory path <ems_dir> or use the default path: /users/ems.
The logical hostname and a floating IP address shared between the two nodes must be configured in /etc/hosts. Ems-Service is used as the logical hostname in the examples in the rest of this appendix.
note_smallImportant: The following options are not set when installing in Failover Mode:
WEM Service started by default and monitored by Process Monitor. (See the WEM Process Monitor chapter for more information on processes.)
Creating and Configuring a Cluster Resource Group
This section explains how to create a Resource Group specifically for WEM servers in this cluster and configure it appropriately.
note_smallImportant: This process is configured on only one server in the cluster. It is reflected on both.
Creating a Resource Group
The following describe how to create a Resource Group.
We recommend the cluster binary path is set in the shell environment as this means you can execute the cluster commands from any directory path.
Before clsetup can create a network resource for any logical hostname, that hostname and a common floating IP address associated with it must be specified in the /etc/hosts directory on both servers. This example uses ems-service as the logical hostname.
Step 1
Login as root and run clsetup to open the Main Menu.
Step 2
Step 3
A resource group is a container into which you can place resources of various types, such as network and data service resources and then manage them. Only failover resource groups can contain network resources. A network resource would include logical hostname.
Step 4
When prompted to create a failover group, enter yes and select Option 1: Create a Failover Group. For this example, call the group ems-rg.
Step 5
When you are prompted to select a preferred server enter yes and enter the name of the Preferred server; for this example use Node1. Enter yes to continue the update.
The screen will display the following message:
clresourcegroup create -n <Node-1 Node Name> <Node-2 Node Name> ems-rg
Command completed successfully.
With the Resource Group created successfully, you can move on to the next step and add the logical hostname.
Adding a Logical Hostname to a Failover Resource Group
Follow steps 1 - 5 to add a logical hostname.
Step 1
After the confirmation screen from the last task displays, press Enter to continue. Enter yes when prompted to add network resources.
Step 2
If a failover resource group contains logical hostname resources, the most common configuration is to have one logical hostname resource for each subnet. Enter 1 to create a single resource.
Step 3
Step 4
Press Enter to continue. The screen displays:
clreslogicalhostname create -g ems-rg -p R_description="LogicalHostname resource for ems-service" ems-service
Step 5
Enter no when prompted to add any additional network resources.
Adding a Data Service Resource
Follow steps 1 - 4 to add a data service.
Step 1
After the logical hostname confirmation screen, enter yes when prompted to begin adding data services.
Step 2
From the Data Services Menu select Option 1: EMSSCFO Server for Sun Cluster, and use the name ems-dsr for this example.
The screen displays the following message:
This data service uses the "Port_list" property. The default "Port_list" for this data service is as follows: <NULL>
Step 3
Enter no when prompted to override the default.
Step 4
Enter no when prompted to add more properties, then enter yes to continue.
The screen displays the following message:
Commands completed successfully
Bringing the Resource Group Online
Follow steps 1 - 2 to bring the Resource Group online.
Step 1
After the completion confirmation screen, press Enter to continue. Enter no when prompted to add any additional data service resources. Enter yes when prompted to manage and bring this resource group online.
The screen displays the following message:
clresourcegroup online -M ems-rg
Commands completed successfully
Step 2
Press Enter to continue, then select Option q to Quit and return to the Main Menu.
The process is now complete. At this point you can enter the scstat command to display the current online/offline status if required.
Upgrading Web Element Manager in a Clustered Environment
This section describes the process for upgrading Web Element Manager in a two-server cluster.
note_smallImportant: Network administrators are advised that they should have any connected clients log out at this time. If clients cannot reconnect after the upgrade, please refer to the Troubleshooting appendix for information on any Java-related errors.
Prerequisite Steps for the Upgrade Process
For the example configuration that follows you should confirm the following:
This can be confirmed either by a software switchover, or by running the scstat command to confirm the current node status.
Two Cluster Nodes: N-1 (initially this is the active node) and N-2 (initially this is the redundant node).
Upgrade Process Overview
This section provides a broad overview of the procedures to follow.
Step 1
Start with two server nodes: N-1 and N-2. They share both data files and database information. N-1 is currently active and N-2 is currently in standby mode, as shown below:
Step 2
With Web Element Manager running on N-1, put N-2 into maintenance mode (this is described in Removing an Inactive Node from the Resource Group below). Then upgrade WEM to version 12.0 or later.
At this point, the database is not yet updated. In cluster mode, when WEM is upgraded on N-2 the postgres database is not started. so a new DB schema is not created. This is because the schema will be updated through the currently active server N-1 with SQL scripts that are provided as part of the installation package; this is explained in the section Updating the Databases. At this point N-2 is still in maintenance mode but now with updated WEM software, and the database schema has now been updated as shown below:
Step 3
To prepare for the switchover, take N-2 out of maintenance mode.
Step 4
To upgrade WEM on N-1, bring N-2 back into the cluster and do a switchover from N-1 to N-2 so now N-2 is the active host and WEM starts running processes.
Step 5
Now place the formerly active node N-1 into maintenance mode and upgrade WEM software while N-2 is the active node as shown below:
Step 6
Once the upgrade is complete, N-1 can be switched back to its role of active node if necessary by using the Cluster commands, or N-2 can continue as the active node.
note_smallImportant: Refer to the Installing the WEM Software chapter for more complete information about scripts and files. All configuration files placed in the <EMS_HOME>/server/etc folder and the script files in the <EMS_HOME>/server/scripts folder must be identical. This ensures that after the switch between servers, the behavior of EMS does not change in any way.
Removing an Inactive Node from the Resource Group
Complete the following steps to remove N-2 from the Resource Group. Since the cluster resource group configuration will be same for both nodes, the cluster-related commands can be run on either node.
Step 1
Run the scstat command. scstat is used to verify the current status of the cluster resource group and ensures that on switchover/failover the servers will switch correctly. The following screen display reflects a properly configured cluster:
Two cluster nodes: Online
Two cluster transport paths: Online
Quorum votes by node: Online
Quorum votes by device: Online
Resource Groups and Resources: ems-rg, ems-service, ems-dsr
Ems-rg group N-1: Online N-2: Offline
IPMP groups: Online
note_smallImportant: N-2 must not be allowed to run any WEM processes. This prevents the secondary node from taking ownership of resources. Removing it from the Resource Group prevents a failover from happening and N-1 continues to behave like a standalone WEM thus ensuring a successful upgrade. To do this:
Step 2
Enter clsetup to open the Main Menu and select Option 2: Resource Groups Menu.
Step 3
From the Resource Groups menu, select Option 8: Change the Properties of a Resource Group.
Step 4
Enter yes when prompted to continue.
Step 5
Step 6
Select Option 1: Change the Nodelist Resource Group Properties
Step 7
Enter yes when prompted to continue. Both N-1 and N-2 should now appear in the nodelist.
Step 8
Select Option 2: Remove a Node from the Nodelist, then select Option 1 to remove N-2.
The nodelist now contains only N-1. When prompted to update the nodelist property, enter yes. If your update was successful you will see the following screen confirmation:
command completed successfully
Press Enter to continue. You will receive confirmation that only N-1 remains in the nodelist. Select Option q to Quit and exit back to the Resource Group Menu.
Step 9
From the Resource Group Menu select Option s: Show Current Status to confirm the current network resources if required.
Upgrading WEM on the Inactive Server
Complete the following steps to upgrade WEM on the inactive server, N-2.
Step 1
Updating PostgreSQL config file...This is an upgrade in Cluster mode; not updating postgres config.
This message is normal because the database is to be updated from the active server, N-1.
Step 2
Updating the Databases
Complete the following steps to update the databases from node N-1.
Step 1
Copy the sqlfiles.tar file from the N-2 installation to a folder on N-1 and untar the file. This process is described fully in the Installing the WEM Software chapter.
This will create a folder called sqlfiles.
Step 2
Go to the sqlfiles folder and run dbClusterUpgrade.sh.
Step 3
Step 4
Step 5
Step 6
Press Enter.
Step 7
note_smallImportant: Currently, this is port 5432, but this may change in a later release.
Step 8
Database schema upgraded successfully...
Returning the Inactive Node to the Resource Group
Complete the following steps to return N-2 to the Resource Group and take over resource ownership in order to upgrade the software on N-1.
Step 1
Run clsetup and then log in to access the Main Menu and select Option 2: Resource Groups.
Step 2
From the Resource Groups Menu select Option 8: Change the Properties of a Resource Group.
Step 3
Enter yes when prompted to continue.
Step 4
Step 5
Step 6
Step 7
Select Option 1: Add a Node/Zone to the Top of the Nodelist.
Step 8
Step 9
Enter yes when prompted to update the nodelist property.
The screen will display the following message:
Command completed successfully.
Step 10
Press Enter to continue and select Option q to Quit and return to the Resource Groups Menu.
Switching Active Servers
Complete the following steps to make N-2 the active node so N-1 can be updated.
 
Step 1
Step 2
Step 3
Step 4
Select Option 1: Switch Group Ownership.
Step 5
Select the node to take ownership of ems-rg, which would be N-2. Enter yes to confirm. The screen will display the following message:
Command completed successfully.
Step 6
Press Enter to continue and select Option q to Quit and return to the Resource Group Menu.
Step 7
From the Resource Group Menu select Option s Show Current Status. This shows that N-2 is now online and N-1 is offline.
At this point return to Removing an Inactive Node from the Resource Group and begin the update process for N-1.
note_smallImportant: Since the database schema were previously updated and both N-1 and N-2 share the same database, it is not necessary to run the SQL scripts again for N-1.
High Availability Mode Using Veritas Cluster Software (VCS)
This section provides instructions specific to a Symantec VCS installation to provide redundancy to multiple WEM servers. This software is documented by Symantec, and you will also need to refer to the Install, the Uninstall, and the Upgrade chapters in this guide. Server hardware requirements are in theWEM Port and Hardware Information chapter.
note_smallImportant: Veritas Cluster is supported on both Sun servers using the Solaris Operating System and Cisco UCS servers using the RHEL OS. The VCS installation in this section is directed to installments on the RHEL platform. The VCS itself has a lot in common with the Solaris installation in the previous chapter; however, IPMP is proprietary software and supported only on the Solaris OS. A radio button on the installation screen allows the choice between a Solaris or a RHEL installation; for this reason, please use the GUI to perform the installation rather than the command line.
note_smallImportant: There are configuration changes required when installing WEM in Failover Mode rather than Standalone Mode. These re described in the Installing the WEM Software chapter. For this release, please use the GUI to perform the installation rather than the command line.
Installation
Refer to the relevant documentation to install the appropriate operating system on the servers.
Refer to the VCS documentation for the following steps:
1.
2.
There is an example of a valid Main.cf configuration below
These resources need to be online when installing the WEM application on each node. WEM will be part of the 'Application resource' and its status is monitored with the PID file of psmon (Monitor server). In case of WEM application resource failure, VCS will first try to restart WEM on the same node before switchover to the standby node.
3.
4.
note_smallImportant: Make certain that the WEM application does not start after the installation is complete.
5.
6.
7.
Main.cf File Configuration Example
The following is an example of the main.cf file for resource-groups and resources.
group wemFailover (
SystemList = { pnstextappsucs1 = 0,
pnstextappsucs3 = 1 } AutoStartList = { pnstextappsucs3 }
)
Application wemService
( StartProgram = "/users/ems/postgres//bin/emsctl
start" StopProgram = "/users/ems/postgres//bin/emsctl
forcestop" PidFiles = { "/users/ems/server/psmon.pid" }
RestartLimit = 1 )
DiskGroup wemDG (
DiskGroup = wemdg )
IP wemIP (
Device = eth0 Address = "10.4.83.151"
NetMask = "255.255.255.0"
)
Mount wemMount (
MountPoint = "/apps/wem/"
BlockDevice = "/dev/vx/dsk/wemdg/wemvol"
FSType = vxfs FsckOpt = "-y"
)
NIC wemNIC (
Device = eth0 )
Volume wemVolume (
DiskGroup = wemdg Volume = wemvol
)
wemIP requires wemNIC wemMount requires wemVolume wemService requires wemIP wemService requires wemMount wemVolume requires wemDG
// resource dependency tree // // group wemFailover // { // Application wemService // { // IP wemIP // { // NIC wemNIC // } // Mount wemMount // { // Volume wemVolume // { // DiskGroup wemDG // } // } // } // }
Upgrading WEM with VCS
The process for upgrading WEM installed in HA mode with VCS is similar to the Sun Cluster upgrade process described earlier.
With VCS, in order to disable the resource on the standby node, you have to set the resource-group's Disable attribute for the standby node (system). This ensures that the resource group does not failover in between the upgrade and result in any sort of data corruption.
1.
$ hagrp -disable <resource-group name> -sys <node2>
2.
$ hagrp -switch <resource-group name> to <system>
Uninstalling WEM with VCS
Use the following steps to uninstall WEM in redundant mode.
1.
This will make sure that the resource group does not failover during the uninstall process and result in any sort of data corruption.
$ hagrp -disable <resource group name> -sys <node2>
2.
$ hares -modify <resource-name> Enabled 0
3.
This will make sure that the resource group does not failover during the uninstall and result in any sort of data corruption.
$ hagrp -enable <resource group name> -sys <node2>
$ hagrp -disable <resource group name> -sys <node1>
4.
5.
 
 

Cisco Systems Inc.
Tel: 408-526-4000
Fax: 408-527-0883